initialized neural network
Appendices for " Pruning Randomly Initialized Neural Networks with Iterative Randomization " Contents
We consider a target neural networkf: Rd0 Rdl of depth l, which is described as follows. Similar to the previous works [6, 7], we assume that g(x) is twice as deep as the target network f(x). Thus, g(x) can be described as g(x)=G2lσ(G2l 1σ( G1(x))), (2) where Gj is a edj edj 1 matrix (edj N 1 for j = 1,,2l) with ed2i = di. Under this re-sampling assumption, we describe our main theorem as follows. 1 Theorem A.1 (Main Theorem) Fix,δ>0, and we assume thatkFikFrob 1. LetR Nand we assumethat each elementof Gi can be re-sampled with replacementfrom the uniformdistribution U[ 1,1] up to R 1 times. If n 2log(1δ) holds, then with probability at least 1 δ, we have |α Xi|, (5) for some i {1,,n}.
Pruning Randomly Initialized Neural Networks with Iterative Randomization
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet.
Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks
Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization?
Pruning Randomly Initialized Neural Networks with Iterative Randomization
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters.
Function Approximation with Randomly Initialized Neural Networks for Approximate Model Reference Adaptive Control
Lekang, Tyler, Lamperski, Andrew
Classical results in neural network approximation theory show how arbitrary continuous functions can be approximated by networks with a single hidden layer, under mild assumptions on the activation function. However, the classical theory does not give a constructive means to generate the network parameters that achieve a desired accuracy. Recent results have demonstrated that for specialized activation functions, such as ReLUs and some classes of analytic functions, high accuracy can be achieved via linear combinations of randomly initialized activations. These recent works utilize specialized integral representations of target functions that depend on the specific activation functions used. This paper defines mollified integral representations, which provide a means to form integral representations of target functions using activations for which no direct integral representation is currently known. The new construction enables approximation guarantees for randomly initialized networks for a variety of widely used activation functions.
Pruning Randomly Initialized Neural Networks with Iterative Randomization
Chijiwa, Daiki, Yamaguchi, Shin'ya, Ida, Yasutoshi, Umakoshi, Kenji, Inoue, Tomohiro
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet.